Adaptive Feature Abstraction for Translating Video to Language

نویسندگان

  • Yunchen Pu
  • Martin Renqiang Min
  • Zhe Gan
  • Lawrence Carin
چکیده

Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video representations, preventing them from modeling rich, varying context-dependent semantics in video descriptions. In this paper, we propose a new approach to generating adaptive spatiotemporal representations of videos for a captioning task. For this purpose, novel attention mechanisms with spatiotemporal alignment is employed to adaptively and sequentially focus on different layers of CNN features (levels of feature “abstraction”), as well as local spatiotemporal regions of the feature maps at each layer. Our approach is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Feature Abstraction for Translating Video to Text

Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable contextdependent semantics in the video may make it more appropriate to adaptively select features from the multiple CNN layers. We propose a new approach for generating adaptive spatiotemporal representations of videos for the captioni...

متن کامل

Video Abstraction in H.264/AVC Compressed Domain

Video abstraction allows searching, browsing and evaluating videos only by accessing the useful contents. Most of the studies are using pixel domain, which requires the decoding process and needs more time and process consuming than compressed domain video abstraction. In this paper, we present a new video abstraction method in H.264/AVC compressed domain, AVAIF. The method is based on the norm...

متن کامل

Adaptive Spectral Separation Two Layer Coding with Error Concealment for Cell Loss Resilience

This paper addresses the issue of cell loss and its consequent effect on video quality in a packet video system, and examines possible compensative measures. In the system's enconder, adaptive spectral separation is used to develop a two-layer coding scheme comprising a high priority layer to carry essential video data and a low priority layer with data to enhance the video image. A two-step er...

متن کامل

An Efficient Adaptive Boundary Matching Algorithm for Video Error Concealment

Sending compressed video data in error-prone environments (like the Internet and wireless networks) might cause data degradation. Error concealment techniques try to conceal the received data in the decoder side. In this paper, an adaptive boundary matching algorithm is presented for recovering the damaged motion vectors (MVs). This algorithm uses an outer boundary matching or directional tempo...

متن کامل

Content-based retrieval from digital video

There is already a huge demand for efficient image indexing and content-based retrieval. With TV going digital, advances in real-time video decompression, easy access to the Internet and the availability of cheap mass storage and fast graphics adaptor cards, digital video will become the next “big” media. Unfortunately, automatic indexing and feature extraction from digital video is even harder...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1611.07837  شماره 

صفحات  -

تاریخ انتشار 2016